Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 17.724
Filtrar
1.
Database (Oxford) ; 20242024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38557634

RESUMO

The rapid growth in the number of experimental and predicted protein structures and more complicated protein structures poses a significant challenge for computational biology in leveraging structural information and accurate representation of protein surface properties. Recently, AlphaFold2 released the comprehensive proteomes of various species, and protein surface property representation plays a crucial role in protein-molecule interaction predictions, including those involving proteins, nucleic acids and compounds. Here, we proposed the first extensive database, namely ProNet DB, that integrates multiple protein surface representations and RNA-binding landscape for 326 175 protein structures. This collection encompasses the 16 model organism proteomes from the AlphaFold Protein Structure Database and experimentally validated structures from the Protein Data Bank. For each protein, ProNet DB provides access to the original protein structures along with the detailed surface property representations encompassing hydrophobicity, charge distribution and hydrogen bonding potential as well as interactive features such as the interacting face and RNA-binding sites and preferences. To facilitate an intuitive interpretation of these properties and the RNA-binding landscape, ProNet DB incorporates visualization tools like Mol* and an Online 3D Viewer, allowing for the direct observation and analysis of these representations on protein surfaces. The availability of pre-computed features enables instantaneous access for users, significantly advancing computational biology research in areas such as molecular mechanism elucidation, geometry-based drug discovery and the development of novel therapeutic approaches. Database URL:  https://proj.cse.cuhk.edu.hk/aihlab/pronet/.


Assuntos
Proteoma , RNA , Sítios de Ligação , Bases de Dados de Proteínas , RNA/química , Proteínas de Membrana , Propriedades de Superfície
2.
Biomolecules ; 14(3)2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38540707

RESUMO

Disordered linkers (DLs) are intrinsically disordered regions that facilitate movement between adjacent functional regions/domains, contributing to many key cellular functions. The recently completed second Critical Assessments of protein Intrinsic Disorder prediction (CAID2) experiment evaluated DL predictions by considering a rather narrow scenario when predicting 40 proteins that are already known to have DLs. We expand this evaluation by using a much larger set of nearly 350 test proteins from CAID2 and by investigating three distinct scenarios: (1) prediction residues in DLs vs. in non-DL regions (typical use of DL predictors); (2) prediction of residues in DLs vs. other disordered residues (to evaluate whether predictors can differentiate residues in DLs from other types of intrinsically disordered residues); and (3) prediction of proteins harboring DLs. We find that several methods provide relatively accurate predictions of DLs in the first scenario. However, only one method, APOD, accurately identifies DLs among other types of disordered residues (scenario 2) and predicts proteins harboring DLs (scenario 3). We also find that APOD's predictive performance is modest, motivating further research into the development of new and more accurate DL predictors. We note that these efforts will benefit from a growing amount of training data and the availability of sophisticated deep network models and emphasize that future methods should provide accurate results across the three scenarios.


Assuntos
Biologia Computacional , Proteínas Intrinsicamente Desordenadas , Biologia Computacional/métodos , Proteínas/química , Proteínas Intrinsicamente Desordenadas/química , Bases de Dados de Proteínas
3.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38532297

RESUMO

MOTIVATION: Computational methods to detect correlated amino acid positions in proteins have become a valuable tool to predict intra- and inter-residue protein contacts, protein structures, and effects of mutation on protein stability and function. While there are many tools and webservers to compute coevolution scoring matrices, there is no central repository of alignments and coevolution matrices for large-scale studies and pattern detection leveraging on biological and structural annotations already available in UniProt. RESULTS: We present a Python library, PyCoM, which enables users to query and analyze coevolution matrices and sequence alignments of 457 622 proteins, selected from UniProtKB/Swiss-Prot database (length ≤ 500 residues), from a precompiled coevolution matrix database (PyCoMdb). PyCoM facilitates the development of statistical analyses of residue coevolution patterns using filters on biological and structural annotations from UniProtKB/Swiss-Prot, with simple access to PyCoMdb for both novice and advanced users, supporting Jupyter Notebooks, Python scripts, and a web API access. The resource is open source and will help in generating data-driven computational models and methods to study and understand protein structures, stability, function, and design. AVAILABILITY AND IMPLEMENTATION: PyCoM code is freely available from https://github.com/scdantu/pycom and PyCoMdb and the Jupyter Notebook tutorials are freely available from https://pycom.brunel.ac.uk.


Assuntos
Proteínas , Software , Proteínas/química , Alinhamento de Sequência , Aminoácidos , Bases de Dados de Proteínas
4.
Brief Bioinform ; 25(2)2024 Jan 22.
Artigo em Inglês | MEDLINE | ID: mdl-38426325

RESUMO

Accurate metabolite annotation and false discovery rate (FDR) control remain challenging in large-scale metabolomics. Recent progress leveraging proteomics experiences and interdisciplinary inspirations has provided valuable insights. While target-decoy strategies have been introduced, generating reliable decoy libraries is difficult due to metabolite complexity. Moreover, continuous bioinformatics innovation is imperative to improve the utilization of expanding spectral resources while reducing false annotations. Here, we introduce the concept of ion entropy for metabolomics and propose two entropy-based decoy generation approaches. Assessment of public databases validates ion entropy as an effective metric to quantify ion information in massive metabolomics datasets. Our entropy-based decoy strategies outperform current representative methods in metabolomics and achieve superior FDR estimation accuracy. Analysis of 46 public datasets provides instructive recommendations for practical application.


Assuntos
Algoritmos , Espectrometria de Massas em Tandem , Entropia , Espectrometria de Massas em Tandem/métodos , Metabolômica/métodos , Biologia Computacional/métodos , Bases de Dados de Proteínas
5.
Microbiome ; 12(1): 58, 2024 Mar 19.
Artigo em Inglês | MEDLINE | ID: mdl-38504332

RESUMO

BACKGROUND: Microbiota are closely associated with human health and disease. Metaproteomics can provide a direct means to identify microbial proteins in microbiota for compositional and functional characterization. However, in-depth and accurate metaproteomics is still limited due to the extreme complexity and high diversity of microbiota samples. It is generally recommended to use metagenomic data from the same samples to construct the protein sequence database for metaproteomic data analysis. Although different metagenomics-based database construction strategies have been developed, an optimization of gene taxonomic annotation has not been reported, which, however, is extremely important for accurate metaproteomic analysis. RESULTS: Herein, we proposed an accurate taxonomic annotation pipeline for genes from metagenomic data, namely contigs directed gene annotation (ConDiGA), and used the method to build a protein sequence database for metaproteomic analysis. We compared our pipeline (ConDiGA or MD3) with two other popular annotation pipelines (MD1 and MD2). In MD1, genes were directly annotated against the whole bacterial genome database; in MD2, contigs were annotated against the whole bacterial genome database and the taxonomic information of contigs was assigned to the genes; in MD3, the most confident species from the contigs annotation results were taken as reference to annotate genes. Annotation tools, including BLAST, Kaiju, and Kraken2, were compared. Based on a synthetic microbial community of 12 species, it was found that Kaiju with the MD3 pipeline outperformed the others in the construction of protein sequence database from metagenomic data. Similar performance was also observed with a fecal sample, as well as in silico mixed datasets of the simulated microbial community and the fecal sample. CONCLUSIONS: Overall, we developed an optimized pipeline for gene taxonomic annotation to construct protein sequence databases. Our study can tackle the current taxonomic annotation reliability problem in metagenomics-derived protein sequence database and can promote the in-depth metaproteomic analysis of microbiome. The unique metagenomic and metaproteomic datasets of the 12 bacterial species are publicly available as a standard benchmarking sample for evaluating various analysis pipelines. The code of ConDiGA is open access at GitHub for the analysis of microbiota samples. Video Abstract.


Assuntos
Microbiota , Humanos , Bases de Dados de Proteínas , Anotação de Sequência Molecular , Reprodutibilidade dos Testes , Microbiota/genética , Metagenoma/genética , Bactérias/genética , Metagenômica/métodos
6.
Sci Data ; 11(1): 281, 2024 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-38459036

RESUMO

Organelles do not act as autonomous discrete units but rather as interconnected hubs that engage in extensive communication by forming close contacts called "membrane contact sites (MCSs)". And many proteins have been identified as residing in MCS and playing important roles in maintaining and fulfilling specific functions within these microdomains. However, a comprehensive compilation of these MCS proteins is still lacking. Therefore, we developed MCSdb, a manually curated resource of MCS proteins and complexes from publications. MCSdb documents 7010 MCS protein entries and 263 complexes, involving 24 organelles and 44 MCSs across 11 species. Additionally, MCSdb orchestrates all data into different categories with multitudinous information for presenting MCS proteins. In summary, MCSdb provides a valuable resource for accelerating MCS functional interpretation and interorganelle communication deciphering.


Assuntos
Membrana Celular , Bases de Dados de Proteínas , Organelas , Proteínas , Organelas/química , Membrana Celular/química , Proteínas/química
7.
Microbiome ; 12(1): 46, 2024 Mar 07.
Artigo em Inglês | MEDLINE | ID: mdl-38454512

RESUMO

BACKGROUND: By analyzing the proteins which are the workhorses of biological systems, metaproteomics allows us to list the taxa present in any microbiota, monitor their relative biomass, and characterize the functioning of complex biological systems. RESULTS: Here, we present a new strategy for rapidly determining the microbial community structure of a given sample and designing a customized protein sequence database to optimally exploit extensive tandem mass spectrometry data. This approach leverages the capabilities of the first generation of Quadrupole Orbitrap mass spectrometer incorporating an asymmetric track lossless (Astral) analyzer, offering rapid MS/MS scan speed and sensitivity. We took advantage of data-dependent acquisition and data-independent acquisition strategies using a peptide extract from a human fecal sample spiked with precise amounts of peptides from two reference bacteria. CONCLUSIONS: Our approach, which combines both acquisition methods, proves to be time-efficient while processing extensive generic databases and massive datasets, achieving a coverage of more than 122,000 unique peptides and 38,000 protein groups within a 30-min DIA run. This marks a significant departure from current state-of-the-art metaproteomics methodologies, resulting in broader coverage of the metabolic pathways governing the biological system. In combination, our strategy and the Astral mass analyzer represent a quantum leap in the functional analysis of microbiomes. Video Abstract.


Assuntos
Microbiota , Espectrometria de Massas em Tandem , Humanos , Espectrometria de Massas em Tandem/métodos , Proteômica/métodos , Peptídeos , Bases de Dados de Proteínas
8.
Bioinformatics ; 40(4)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38547405

RESUMO

MOTIVATION: Protein sequence database search and multiple sequence alignment generation is a fundamental task in many bioinformatics analyses. As the data volume of sequences continues to grow rapidly, there is an increasing need for efficient and scalable multiple sequence query algorithms for super-large databases without expensive time and computational costs. RESULTS: We introduce Chorus, a novel protein sequence query system that leverages parallel model and heterogeneous computation architecture to enable users to query thousands of protein sequences concurrently against large protein databases on a desktop workstation. Chorus achieves over 100× speedup over BLASTP without sacrificing sensitivity. We demonstrate the utility of Chorus through a case study of analyzing a ∼1.5-TB large-scale metagenomic datasets for novel CRISPR-Cas protein discovery within 30 min. AVAILABILITY AND IMPLEMENTATION: Chorus is open-source and its code repository is available at https://github.com/Bio-Acc/Chorus.


Assuntos
Algoritmos , Software , Sequência de Aminoácidos , Proteínas , Bases de Dados de Proteínas
9.
FEBS Lett ; 598(7): 725-742, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38439692

RESUMO

Protein-protein interactions (PPIs) are often mediated by short linear motifs (SLiMs) in one protein and domain in another, known as domain-motif interactions (DMIs). During the past decade, SLiMs have been studied to find their role in cellular functions such as post-translational modifications, regulatory processes, protein scaffolding, cell cycle progression, cell adhesion, cell signalling and substrate selection for proteasomal degradation. This review provides a comprehensive overview of the current PPI detection techniques and resources, focusing on their relevance to capturing interactions mediated by SLiMs. We also address the challenges associated with capturing DMIs. Moreover, a case study analysing the BioGrid database as a source of DMI prediction revealed significant known DMI enrichment in different PPI detection methods. Overall, it can be said that current high-throughput PPI detection methods can be a reliable source for predicting DMIs.


Assuntos
Mapeamento de Interação de Proteínas , Proteínas , Domínios e Motivos de Interação entre Proteínas , Proteínas/metabolismo , Bases de Dados de Proteínas
10.
Nat Methods ; 21(3): 477-487, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38326495

RESUMO

Deep learning models, such as AlphaFold2 and RosettaFold, enable high-accuracy protein structure prediction. However, large protein complexes are still challenging to predict due to their size and the complexity of interactions between multiple subunits. Here we present CombFold, a combinatorial and hierarchical assembly algorithm for predicting structures of large protein complexes utilizing pairwise interactions between subunits predicted by AlphaFold2. CombFold accurately predicted (TM-score >0.7) 72% of the complexes among the top-10 predictions in two datasets of 60 large, asymmetric assemblies. Moreover, the structural coverage of predicted complexes was 20% higher compared to corresponding Protein Data Bank entries. We applied the method on complexes from Complex Portal with known stoichiometry but without known structure and obtained high-confidence predictions. CombFold supports the integration of distance restraints based on crosslinking mass spectrometry and fast enumeration of possible complex stoichiometries. CombFold's high accuracy makes it a promising tool for expanding structural coverage beyond monomeric proteins.


Assuntos
Algoritmos , Bases de Dados de Proteínas , Espectrometria de Massas
11.
J Ethnopharmacol ; 326: 117959, 2024 May 23.
Artigo em Inglês | MEDLINE | ID: mdl-38423413

RESUMO

ETHNOPHARMACOLOGICAL RELEVANCE: Compound Jixuecao Decoction (CJD) is a traditional Chinese herbal medicine prescribed in China to treat chronic renal failure (CRF). Previous studies have shown that CJD affects cell apoptosis and proliferation. However, the mechanism of its renal protective action has not been characterized. AIM OF THE STUDY: To explore the mechanism(s) underlying the effect of CJD on endoplasmic reticulum stress (ERS) and apoptosis in the treatment of CRF using network pharmacology, molecular docking, molecular dynamics simulations, and in vivo studies. MATERIALS AND METHODS: The compounds comprising CJD were extracted from the Traditional Chinese Medicine Systems Pharmacology Database. A Swiss target prediction database and similarity integration approach were employed to identify potential targets of these components. The GeneCards and DisGeNET databases were used to identify targets associated with CRF, apoptosis, and ERS. The STRING database was employed to analyze the protein-protein interactions (PPIs) associated with drug-disease crossover. A chemical composition-shared target network was established, and critical pathways were identified through gene ontology and Kyoto Encyclopedia of Genes and Genomes enrichment analyses. The Protein Data Bank database was used to search key proteins, while molecular docking and dynamics simulations were performed between the top four CJD active ingredients and proteins involved in apoptosis and ERS in CRF. Subsequent in vivo studies using a 5/6 nephrectomy rat model of CRF were performed to verify the findings. RESULTS: The 80 compounds identified in CJD yielded 875 target genes, of which 216 were potentially related to CRF. PPI network analysis revealed key targets via topology filtering. Enrichment analysis, molecular docking, and molecular dynamics simulation results suggested that CJD primarily targets mitofusin-2 (MFN2), B-cell lymphoma-2 (BCL2), BAX, protein kinase RNA-like ER kinase (PERK), and C/EBP homologous protein (CHOP) during CRF treatment. In vivo, CJD significantly increased the abundance of MFN2, BCL2, and significantly reduced the abundance of BAX, PERK, CHOP proteins in kidney tissues, indicating that CJD could improve apoptosis and ERS in CRF rats. CONCLUSIONS: This study provides evidence that CJD effectively delays CFR through modulation of the MFN2 and PERK-eIF2α-ATF4-CHOP signaling pathways.


Assuntos
Medicamentos de Ervas Chinesas , Falência Renal Crônica , Insuficiência Renal Crônica , Animais , Ratos , Simulação de Acoplamento Molecular , Proteína X Associada a bcl-2 , Estresse do Retículo Endoplasmático , Apoptose , Bases de Dados de Proteínas , Medicina Tradicional Chinesa , Medicamentos de Ervas Chinesas/farmacologia , Medicamentos de Ervas Chinesas/uso terapêutico
12.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38377393

RESUMO

MOTIVATION: Eukaryotic linear motifs (ELMs), or Short Linear Motifs, are protein interaction modules that play an essential role in cellular processes and signaling networks and are often involved in diseases like cancer. The ELM database is a collection of manually curated motif knowledge from scientific papers. It has become a crucial resource for investigating motif biology and recognizing candidate ELMs in novel amino acid sequences. Users can search amino acid sequences or UniProt Accessions on the ELM resource web interface. However, as with many web services, there are limitations in the swift processing of large-scale queries through the ELM web interface or API calls, and, therefore, integration into protein function analysis pipelines is limited. RESULTS: To allow swift, large-scale motif analyses on protein sequences using ELMs curated in the ELM database, we have extended the gget suite of Python and command line tools with a new module, gget elm, which does not rely on the ELM server for efficiently finding candidate ELMs in user-submitted amino acid sequences and UniProt Accessions. gget elm increases accessibility to the information stored in the ELM database and allows scalable searches for motif-mediated interaction sites in the amino acid sequences. AVAILABILITY AND IMPLEMENTATION: The manual and source code are available at https://github.com/pachterlab/gget.


Assuntos
Proteínas , Software , Motivos de Aminoácidos , Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos
13.
PLoS Comput Biol ; 20(2): e1011586, 2024 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-38416793

RESUMO

Protein structure prediction has now been deployed widely across several different large protein sets. Large-scale domain annotation of these predictions can aid in the development of biological insights. Using our Evolutionary Classification of Protein Domains (ECOD) from experimental structures as a basis for classification, we describe the detection and cataloging of domains from 48 whole proteomes deposited in the AlphaFold Database. On average, we can provide positive classification (either of domains or other identifiable non-domain regions) for 90% of residues in all proteomes. We classified 746,349 domains from 536,808 proteins comprised of over 226,424,000 amino acid residues. We examine the varying populations of homologous groups in both eukaryotes and bacteria. In addition to containing a higher fraction of disordered regions and unassigned domains, eukaryotes show a higher proportion of repeated proteins, both globular and small repeats. We enumerate those highly populated domains that are shared in both eukaryotes and bacteria, such as the Rossmann domains, TIM barrels, and P-loop domains. Additionally, we compare the sampling of homologous groups from this whole proteome set against our stable ECOD reference and discuss groups that have been enriched by structure predictions. Finally, we discuss the implication of these results for protein target selection for future classification strategies for very large protein sets.


Assuntos
Evolução Biológica , Proteoma , Domínios Proteicos , Evolução Molecular , Bactérias , Bases de Dados de Proteínas
14.
Database (Oxford) ; 20242024 Feb 12.
Artigo em Inglês | MEDLINE | ID: mdl-38345567

RESUMO

Detecting changes in the dynamics of secreted proteins in serum has been a challenge for proteomics. Enter secreted protein database (SEPDB), an integrated secretory proteomics database offering human, mouse and rat secretory proteomics datasets collected from serum, exosomes and cell culture media. SEPDB compiles secreted protein information from secreted protein database, UniProt and Human Protein Atlas databases to annotate secreted proteomics data based on protein subcellular localization and disease markers. SEPDB integrates the latest predictive modeling techniques to measure deviations in the distribution of signal peptide structures of secreted proteins, extends signal peptide sequence prediction by excluding transmembrane structural domain proteins and updates the validation analysis pipeline for secreted proteins. To establish tissue-specific profiles, we have also created secreted proteomics datasets associated with different human tissues. In addition, we provide information on heterogeneous receptor network organizational relationships, reflective of the complex functional information inherent in the molecular structures of secreted proteins that serve as ligands. Users can take advantage of the Refreshed Search, Analyze, Browse and Download functions of SEPDB, which is available online at https://sysomics.com/SEPDB/. Database URL:  https://sysomics.com/SEPDB/.


Assuntos
Proteínas , Proteômica , Animais , Camundongos , Ratos , Humanos , Bases de Dados de Proteínas , Proteínas/química , Proteômica/métodos , Sinais Direcionadores de Proteínas
15.
BMC Res Notes ; 17(1): 50, 2024 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-38365785

RESUMO

OBJECTIVE: The superfamily of protein kinases features a common Protein Kinase-like (PKL) three-dimensional fold. Proteins with PKL structure can also possess enzymatic activities other than protein phosphorylation, such as AMPylation or glutamylation. PKL proteins play a vital role in the world of living organisms, contributing to the survival of pathogenic bacteria inside host cells, as well as being involved in carcinogenesis and neurological diseases in humans. The superfamily of PKL proteins is constantly growing. Therefore, it is crucial to gather new information about PKL families. RESULTS: To this end, the KINtaro database ( http://bioinfo.sggw.edu.pl/kintaro/ ) has been created as a resource for collecting and sharing such information. KINtaro combines protein sequence information and additional annotations for more than 70 PKL families, including 32 families not associated with PKL superfamily in established protein domain databases. KINtaro is searchable by keywords and by protein sequence and provides family descriptions, sequences, sequence alignments, HMM models, 3D structure models, experimental structures with PKL domain annotations and sequence logos with catalytic residue annotations.


Assuntos
Proteínas Quinases , Proteínas , Humanos , Proteínas Quinases/genética , Fosforilação , Sequência de Aminoácidos , Alinhamento de Sequência , Bases de Dados de Proteínas
16.
Sci Rep ; 14(1): 3112, 2024 02 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326407

RESUMO

Corticotropin-releasing hormone-binding protein (CRHBP) is involved in many physiological processes. However, it is still unclear what role CRHBP has in tumor immunity and prognosis prediction. Using databases such as the Cancer Genome Atlas (TCGA), Gene Expression Omnibus (GEO), Tumor Protein Database, Timer Database, and Gene Expression Profiling Interactive Analysis (GEPIA), we evaluated the potential role of CRHBP in diverse cancers. Further research looked into the relationships between CRHBP and tumor survival prognosis, immune infiltration, immune checkpoint (ICP) indicators, tumor mutation burden (TMB), microsatellite instability (MSI), mismatch repair (MMR), DNA methylation, tumor microenvironment (TME), and drug responsiveness. The anticancer effect of CRHBP in liver hepatocellular carcinoma (LIHC) was shown by Western blotting, EdU staining, JC-1 staining, transwell test, and wound healing assays. CRHBP expression is significantly low in the majority of tumor types and is associated with survival prognosis, ICP markers, TMB, and microsatellite instability (MSI). The expression of CRHBP was found to be substantially related to the quantity of six immune cell types, as well as the interstitial and immunological scores, showing that CRHBP has a substantial impact in the TME. We also noticed a link between the IC50 of a number of anticancer medicines and the degree of CRHBP expression. CRHBP-related signaling pathways were discovered using functional enrichment. Cox regression analysis showed that CRHBP expression was an independent prognostic factor for LIHC. CRHBP has a tumor suppressor function in LIHC, according to cell and molecular biology trials. CRHBP has a significant impact on tumor immunity, treatment, and prognosis, and has the potential as a cancer treatment target and prognostic indicator.


Assuntos
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/tratamento farmacológico , Carcinoma Hepatocelular/genética , Instabilidade de Microssatélites , Prognóstico , Bases de Dados de Proteínas , Neoplasias Hepáticas/tratamento farmacológico , Neoplasias Hepáticas/genética , Microambiente Tumoral/genética
17.
Molecules ; 29(4)2024 Feb 13.
Artigo em Inglês | MEDLINE | ID: mdl-38398585

RESUMO

The prediction of three-dimensional (3D) protein structure from amino acid sequences has stood as a significant challenge in computational and structural bioinformatics for decades. Recently, the widespread integration of artificial intelligence (AI) algorithms has substantially expedited advancements in protein structure prediction, yielding numerous significant milestones. In particular, the end-to-end deep learning method AlphaFold2 has facilitated the rise of structure prediction performance to new heights, regularly competitive with experimental structures in the 14th Critical Assessment of Protein Structure Prediction (CASP14). To provide a comprehensive understanding and guide future research in the field of protein structure prediction for researchers, this review describes various methodologies, assessments, and databases in protein structure prediction, including traditionally used protein structure prediction methods, such as template-based modeling (TBM) and template-free modeling (FM) approaches; recently developed deep learning-based methods, such as contact/distance-guided methods, end-to-end folding methods, and protein language model (PLM)-based methods; multi-domain protein structure prediction methods; the CASP experiments and related assessments; and the recently released AlphaFold Protein Structure Database (AlphaFold DB). We discuss their advantages, disadvantages, and application scopes, aiming to provide researchers with insights through which to understand the limitations, contexts, and effective selections of protein structure prediction methods in protein-related fields.


Assuntos
Inteligência Artificial , Proteínas , Conformação Proteica , Modelos Moleculares , Proteínas/química , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Software , Dobramento de Proteína
18.
Anal Biochem ; 688: 115483, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38360171

RESUMO

Circular dichroism [CD] is widely used to rapidly assess protein structure. Deconvolution of the far-UV CD spectrum is widely used to quantify the secondary structural elements [SSEs]. Multiple algorithms are available for this. Imperfections in the experimental CD spectra arising from spectral noise, instrument miscalibration, spectral offsets and non-linearity will impact on the accuracy and precision of derived secondary structure estimates. Analytical validation for use in regulated environments, such as biopharmaceuticals, requires that the impact of imperfect data on these estimates be understood. Limited information on the impact of poor data were available. A series of noise-free simulated spectral datasets with modified intensity, wavelength, noise and intensity linearity and offsets were created from entries in the Protein Circular Dichroism Data Bank. These datasets were analysed using the BeStSel, on-line resource to estimate secondary structure. Data imperfections caused significant change in SSEs, but the spectral range is also important. This study emphasises the importance of analytical method validation and justifiable estimates of uncertainty when reporting results. The datasets created are made available as a resource to validate other secondary structure estimation programs.


Assuntos
Algoritmos , Proteínas , Dicroísmo Circular , Proteínas/química , Estrutura Secundária de Proteína , Bases de Dados de Proteínas
19.
Proteomics ; 24(8): e2300084, 2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38380501

RESUMO

Assigning statistical confidence estimates to discoveries produced by a tandem mass spectrometry proteomics experiment is critical to enabling principled interpretation of the results and assessing the cost/benefit ratio of experimental follow-up. The most common technique for computing such estimates is to use target-decoy competition (TDC), in which observed spectra are searched against a database of real (target) peptides and a database of shuffled or reversed (decoy) peptides. TDC procedures for estimating the false discovery rate (FDR) at a given score threshold have been developed for application at the level of spectra, peptides, or proteins. Although these techniques are relatively straightforward to implement, it is common in the literature to skip over the implementation details or even to make mistakes in how the TDC procedures are applied in practice. Here we present Crema, an open-source Python tool that implements several TDC methods of spectrum-, peptide- and protein-level FDR estimation. Crema is compatible with a variety of existing database search tools and provides a straightforward way to obtain robust FDR estimates.


Assuntos
Algoritmos , Peptídeos , Bases de Dados de Proteínas , Peptídeos/química , Proteínas/análise , Proteômica/métodos
20.
BMC Genomics ; 25(1): 96, 2024 Jan 23.
Artigo em Inglês | MEDLINE | ID: mdl-38262929

RESUMO

BACKGROUND: Angelica sinensis (Danggui), a renowned medicinal orchid, has gained significant recognition for its therapeutic effects in treating a wide range of ailments. Genome information serves as a valuable resource, enabling researchers to gain a deeper understanding of gene function. In recent times, the availability of chromosome-level genomes for A. sinensis has opened up vast opportunities for exploring gene functionality. Integrating multiomics data can allow researchers to unravel the intricate mechanisms underlying gene function in A. sinensis and further enhance our knowledge of its medicinal properties. RESULTS: In this study, we utilized genomic and transcriptomic data to construct a coexpression network for A. sinensis. To annotate genes, we aligned them with sequences from various databases, such as the NR, TAIR, trEMBL, UniProt, and SwissProt databases. For GO and KEGG annotations, we employed InterProScan and GhostKOALA software. Additionally, gene families were predicted using iTAK, HMMER, OrholoFinder, and KEGG annotation. To facilitate gene functional analysis in A. sinensis, we developed a comprehensive platform that integrates genomic and transcriptomic data with processed functional annotations. The platform includes several tools, such as BLAST, GSEA, Heatmap, JBrowse, and Sequence Extraction. This integrated resource and approach will enable researchers to explore the functional aspects of genes in A. sinensis more effectively. CONCLUSION: We developed a platform, named ASAP, to facilitate gene functional analysis in A. sinensis. ASAP ( www.gzybioinformatics.cn/ASAP ) offers a comprehensive collection of genome data, transcriptome resources, and analysis tools. This platform serves as a valuable resource for researchers conducting gene functional research in their projects, providing them with the necessary data and tools to enhance their studies.


Assuntos
Angelica sinensis , Genômica , Bases de Dados de Proteínas , Perfilação da Expressão Gênica , Pesquisa em Genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...